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PREFACE 


Four  years  and  eleven  months  have  passed  since  this  research  period  started  on  January  1, 
1990.  During  this  period,  effort  was  focused  on: 

1.  further  development  of  theory  and  methodologies  in  latent  trait  models,  including  the 
proposal  of  a  new  model,  called  acceleration  model, 

2.  development  of  methodologies  for  cognitive  diagnosis  assessment  applying  and  ex¬ 
panding  latent  trait  models,  which  were  eventually  integrated  into  the  method,  called 
competency  space  approach, 

3.  theoretical  integration  of  the  nonparametric  approaches  and  methods  developed  in 
the  past  years,  and 

4.  writing  research  outcomes  in  refereed  journal  papers  and  book  chapters,  and  present¬ 
ing  them  at  international  and  domestic  conferences. 

During  the  research  period  there  were  many  people  on  the  University  of  Tennessee  Knoxville 
campus,  including  the  Acting  Director,  Mr.  Bruce  H.  Delaney,  of  the  Computing  Center,  who 
helped  me  in  conducting  research;  their  helps  are  highly  appreciated.  Also  I  would  like  to 
express  my  gratitude  to  people  of  the  Office  of  Naval  Research,  especially  the  scientic  officers 
Dr.  Charles  E.  Davis  and  Dr.  Susan  E.  Chipman  and  the  ONR  representatives  in  Atlanta, 
including  Mr.  Thomas  Bryant  and  Ms.  C.  C.  Everley. 

Special  thanks  are  due  to  my  graduate  assistant,  Mr.  Christopher  Coleman,  for  helping  me 
in  preparing  this  final  research  report. 


Accesion  For  ^ 

NTIS  CRA&I 

a 

DTIC  TAB 

□  1 

Ui;~i>::Oiinced 
j <  - 

□  i 

1 

. . . . . 

i 

'  !  :  '■  -  , 
l  -  '  I 

i 

Ml 

•  1  \.i 

.  j  | 

November  25 
Author 


TABLE  OF  CONTENTS 


I.  Introduction 

[LI]  Book  Chapters 

[1.2]  Refereed  Journal  Papers 

[1.3]  Proceedings  Article 

[1.4]  Papers  in  Preparation  for  Refereed  Journals 

[1.5]  Invited  Paper  Presentations  at  International  Conferences 

[1.6]  Invited  Paper  Presentations  at  Domestic  Conferences 

[1.7]  Contributed  Paper  Presentations  at  Domestic  Conferences 

[1.8]  Seminars 

[1.9]  Awards 

II.  Competency  Space  Approach  to  Cognitive  Diagnosis 

[II.  1]  Method 

[11.2]  Domain  Knowledge  “Plus” 

[11.3]  Preliminary  Study:  Two  Quizs 

[11.4]  Different  Types  of  Test  Items 

[11.4.1]  PAPER-AND-PENCIL  TESTS  VS.  COMPUTERIZED  TESTS 

[11.4.2]  RECOGNITION  VS.  CONSTRUCTION 

[11.4.3]  PROBLEM  COMPLEXITY 

[11.4.4]  CONVENTIONAL  TESTS  VS.  ADAPTIVE  TESTS 

[11.5]  Computerized  Tests  with  <Given>,  <Hint>  and  <Terminal> 

[11.6]  Diffused  Attributes  and  Concentrated  Attributes 

[11.7]  Alternative  Method:  Multi-Stage  Latent  Trait  Approach 

[11.8]  Grades  of  Attainment 

[11.9]  Decomposition  of  the  Competency  Space 

III.  Efficient  Nonparametric  Approaches  for  Estimating  the 
Operating  Characteristics  of  Discrete  Item  Responses 

[111.1]  Rationale 

[111.2]  Bivariate  P.D.F.  Approach 

[111.3]  Conditional  P.D.F.  Appeoach 

[111.3.1]  SIMPLE  SUM  APPROACH 

[111.3.2]  DIFFERENTIAL  WEIGHT  APPROACH 


IV.  Acceleration  Model 


29 


[IV.  1]  Processing  Functions  29 

[IV.2]  Criteria  for  Evaluating  Mathematical  Models  30 

[IV.3]  General  Acceleration  Model  and  a  Specific  Model  in  Which  31 

the  Logistic  Function  is  Used 

V.  Further  Research  and  Integration  of  Research  Findings  34 

[V.l]  Further  Research  34 

[V.1.1]  MLE  BIAS  FUNCTION  34 

[V.l. 2]  CRITICAL  OBSERVATIONS  OF  THE  TEST  INFORMATION  34 

FUNCTION  AS  A  MEASURE  OF  LOCAL  ACCURACY 

[V.l. 3]  PLAUSIBILITY  FUNCTIONS  OF  DISTRACTORS  34 

[V.l. 4]  ESTIMATION  OF  RELIABILITY  COEFFICIENTS  USING  THE  34 

TEST  INFORMATION  FUNCTION  AND  ITS  MODIFICATIONS 

[V.2]  Integration  of  Research  Findings  35 

[V.2.1]  ROLES  OF  FISHER  TYPE  INFORMATION  IN  LATENT  35 

TRAIT  MODELS 

[V.2.2]  HUMAN  PSYCHOLOGICAL  BEHAVIOR  35 

[V.2.3]  GRADED  RESPONSE  MODEL  35 

VI.  Discussion  37 


I.  Introduction 


Roughly  speaking,  in  the  first  half  of  the  research  period,  the  principal  investigator’s  effort 
was  mainly  focused  on  developing  theories  and  methodologies  for  cognitive  diagnosis  based 
on  latent  trait  models,  interacting  with  Drs.  Susan  Goldman,  Gautum  Biswas  and  other  re¬ 
searchers  of  Vanderbilt  University,  Nashville,  Tennessee,  who  worked  on  trouble  shootings  in 
complementary  metal  oxide  semiconductor  (CMOS)  design  tasks;  in  the  second  half  of  the 
research  period,  emphasis  was  put  upon  writing  and  publishing  book  chapters  and  papers  for 
refereed  journals  on  various  reserch  topics  which  were  investigated  in  the  years  of  the  Office 
of  Naval  Research  fundings  that  started  in  1977,  adding  further  research,  while  continuing 
developing  theories  and  methodologies  for  cognitive  assessment.  A  latent  trait  model,  called 
acceleration  model,  was  proposed  in  the  second  half,  which  belongs  to  the  heterogeneous  case 
of  the  graded  response  model  (Samejima,  1972),  and  will  be  useful  in  cognitive  assessment  as 
well  as  in  more  traditional  areas  that  latent  trait  models  have  been  applied  for,  such  as  mental 
testing. 

During  the  research  period,  1990-94,  3  book  chapters  were  written,  one  of  which  was 
published,  and  the  other  two  are  in  press;  7  papers  were  published  or  accepted  for  publication 
in  refereed  journals;  1  paper  was  published  in  international  conference  proceedings;  and  two 
papers  were  prepared  for  submission  to  refereed  journals.  The  titles  of  these  papers  are  as 
follows: 

[1.1]  Book  Chapters 

[1]  Roles  of  Fisher  Type  Information  in  Latent  Trait  Models,  in  H.  Bozdogan  (Ed.), 
Proceedings  on  the  First  US/ Japan  Conference  on:  The  Frontiers  of  Statistical 
Modeling:  An  Informational  Approach  (3  volumes),  Netherlands:  Kluwer  Aca¬ 
demic  Publishers.  1994. 

[2]  A  Cognitive  Diagnosis  Method  Using  Latent  Trait  Models:  Competency 
Space  Approach  and  its  Relationship  with  Dibello  and  Stout’s  Unified  Cog¬ 
nitive/Psychometric  Diagnosis  Model,  in  P.  D.  Nichols,  S.  E.  Chipman  &  R. 

L.  Brennan  (Eds.),  Cognitive  Diagnostic  Assessment,  Hillsdale,  New  Jersey: 
Lawrence  Erlbaum.  (in  press). 

[3]  Graded  Response  Model,  in  W.  J.  van  der  Linden  &  R.  Hambleton  (Eds.),  Hand¬ 
book  of  Modern  Item  Response  Theory,  New  York:  Springer- Verlag.  (in  press). 

[1.2]  Refereed  Journal  Papers 

[4]  An  approximation  for  the  bias  function  of  the  maximum  likelihood  estimate  of 
a  latent  variable  for  the  general  case  where  the  item  responses  are  discrete,  Psy- 
chometrika,  58,  119-138,  1993. 
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[5]  The  bias  function  of  the  maximum  likelihood  estimate  of  ability  for  the  dichoto¬ 
mous  response  level,  Psychometrika,  58,  195-209,  1993. 

[6]  Some  critical  observations  of  the  test  information  function  as  a  measure  of  local 
accuracy  in  ability  estimation,  Psychometrika,  59,  307-329,  1994. 

[7]  Acceleration  model  in  the  heterogeneous  case  of  the  general  graded  response 
model,  Psychometrika,  (in  press). 

[8]  Nonparametric  estimation  of  the  plausibility  functions  of  the  distractors  of  vo¬ 
cabulary  test  items,  Applied  Psychological  Measuremetn,  (in  press). 

[9]  Estimation  of  reliability  coefficients  using  the  test  information  function  and  its 
two  modifications,  Applied  Psychological  Measurement,  (in  press). 

[10]  Efficient  nonparametric  approaches  for  estimating  the  operating  characteristics 
of  discrete  item  responses,  (accepted  by  Psychometrika). 

[1.3]  Proceedings  Article 

[11]  Human  psychological  behavior:  viewed  from  latent  trait  models.  In  Proceedings 
of  the  IEEE  International  Workshop  on  Neuro  Fuzzy  Control,  March  22-23,  1993, 
Muroran  Institute  of  Technology,  Muroran,  Japan. 

[1.4]  Papers  in  Preparation  for  Refereed  Journals 

[12]  Proposal  of  two  modification  formulas  of  the  test  information  function. 

[13]  A  latent  trait  model  for  the  continuous  item  response  whose  distribution  is  partly 
discrete. 

During  this  period,  the  principal  investigator  presented  2  invited  papers  at  international 
conferences,  3  invited  papers  at  domestic  conferences,  and  12  contributed  papers  at  domestic 
conferences,  as  follows: 

[1.5]  Invited  Paper  Presentations  at  International  Conferences 

[1]  Roles  of  Fisher  type  information  in  latent  trait  models.  US/ Japan  Conference 
on  the  Frontiers  of  Statistical  Modeling:  An  Information  Approach,  Knoxville, 
Tennessee,  May  1992. 

[2]  Human  psychological  behavior:  viewed  from  latent  trait  models.  The  IEEE  Inter¬ 
national  Workshop  on  Neuro-Fuzzy  Control,  Muroran,  Hokkaido,  Japan,  March 
1993. 


2 


[1.6]  Invited  Paper  Presentations  at  Domestic  Conferences 

[3]  Partial  credit  model  and  extensions,  (discussant  paper).  The  Annual  Meeting  of 
the  American  Educational  Research  Association,  Division  D.  Atlanta,  Georgia; 
April,  1993. 

[4]  A  cognitive  diagnosis  method  using  latent  trait  models:  competency  space  approach 
and  its  relationship  with  DiBello  and  Stout’s  unified  cognitive/psychometric  diag¬ 
nosis  model.  ACT/ONR  Conference  on  Alternative  Diagnosis  Assessment.  Iowa 
City,  Iowa;  May,  1993. 

[5]  Assessing  dimensionality  in  item  response  theory  models ,  (discussant  paper).  The 
Annual  Meeting  of  the  National  Council  on  Measurement  in  Education.  New 
Orleans,  Louisiana;  April,  1994. 

[1.7]  Contributed  Paper  Presentations  at  Domestic  Conferences 

[6]  Differential  weight  procedure,  a  nonparametric  approach  for  estimating  the  op¬ 
erating  characteristics  of  discrete  item  responses.  The  Annual  Meeting  of  the 
Americal  Educational  Research  Association.  Chicago,  Illinois;  April,  1991. 

[7]  Some  considerations  for  the  refinement  of  differential  weight  procedure  of  the 
conditional  p.d.f.  approach.  ONR  Conference  on  Model-Based  Measurement. 
Princeton,  New  Jersey;  May,  1991. 

[8]  An  efficient  nonparametric  method  for  estimating  the  operating  characteristics 
of  discrete  responses.  The  Annual  Meeting  of  the  Psychometric  Society.  New 
Brunswick,  New  Jersey;  June,  1991. 

[9]  Usefulness  of  latent  trait  approach  in  cognitive  diagnosis.  ONR  Conference  on 
Cognitive  Diagnosis.  Pittsburg,  Pennsylvania;  October,  1991. 

[10]  Prospects  of  applications  of  nonparametric  methods  of  estimating  the  operating 
characteristics  in  educational  measurement  (round  table  session).  The  Annual 
Meeting  of  the  National  Council  on  Measurement  in  Education.  San  Francisco, 
California;  April,  1992. 

[11]  Comparisons  of  the  estimated  operating  characteristics  obtained  by  the  simple  sum 
and  the  differential  weight  procedures  both  in  conventional  and  adaptive  testings. 
The  Annual  Meeting  of  the  American  Educational  Research  Association.  San 
Francisco,  California;  April,  1992. 

[12]  A  Design  of  Cognitive  Diagnosis.  ONR  Conference  on  Cognitive 
Diagnosis/Model-Based  Measurement.  Champaign,  Illinois;  June,  1992. 
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[13]  Two  modification  formulae  of  the  test  information  function  based  upon  the  MLE 
bias  function.  The  Annual  Meeting  of  the  Psychometric  Society.  Columbus, 

Ohio;  July,  1992. 

[14]  Reliability  coefficient  and  standard  error  of  measurement  viewed  from  latent  trait 
models  and  their  effective  use.  The  Annual  Meeting  of  the  American  Educational 
Research  Association.  Atlanta,  Georgia;  April,  1993. 

[15]  Comparison  of  the  predicted  reliability  coefficients  using  the  test  information  func¬ 
tion  and  its  two  modification  formulae.  The  Annual  Meeting  of  the  Psychometric 
Society.  Berkeley,  Califormia;  July,  1993. 

[16]  Cognitive  diagnosis  using  latent  trait  models,  in  the  symposium,  Approaches  to 
Cognitive  Modeling.  The  Annual  Meeting  of  the  National  Council  on  Measure¬ 
ment  in  Education.  New  Orleans,  Louisian;  April,  1994. 

[17]  Acceleration  model:  a  family  of  graded  response  or  partial  credit  models  in  the  het¬ 
erogeneous  case.  The  Annual  Meeting  of  the  Psychometric  Society.  Champaign, 
Illinois;  June,  1994. 

During  this  period,  the  principal  investigator  gave  two  seminars  with  herself  as  a  sole  speaker, 

as  follows: 

[1.8]  Seminars 

[1]  Acceleration  model  and  its  use  in  the  competency  space  approach  for  cognitive 
diagnosis.  Educational  Testing  Service,  Princeton,  New  Jersey;  March,  1994. 

[2]  Item  response  theory.  Tokyo  Institute  of  Technology,  Tokyo,  Japan;  August, 

1994. 

During  this  period,  the  principal  investigator  was  awarded  by  the  following. 

[1.9]  Awards 

[1]  Appointment  on  the  Board  of  Trustee  of  the  Psychometric  Society.  (1989-)1990. 

[2]  Outstanding  Technical  Contribution  Award  from  the  National  Council  on  Mea¬ 
surement  in  Education,  at  the  Annual  Meeting  of  the  National  Council  on  Mea¬ 
surement  in  Education,  Chicago,  Illinois,  April,  1991. 

References 

[1]  Samejima,  F.  (1972).  A  general  model  for  free-response  data.  Psychomeirika  Monograph,  No.  18. 
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II.  Competency  Space  Approach  to  Cognitive  Diagnosis 

A  comprehensive  methodology  for  cognitive  diagnosis  has  been  developed  (Samejima,  1991, 
1992,  1994b,  1994e)  by  systematically  using  intensive  observations  of  subjects’  behavior  and 
advanced  theories  and  methodologies  originally  developed  in  psychometrics,  taking  advantage 
of  advanced  computer  technologies.  For  convenience,  in  this  section,  circuit  design  is  used  as 
the  task  the  examinee  works  on. 

[II.  1]  Method 

The  following  three  steps  are  taken  in  our  method. 

(a)  The  subject’s  behavior  in  circuit  design  is  observed  and  the  results  are  analyzed, 
using  well-defined  attributes. 

(b)  The  domain  knowledge  “plus”  by  means  of  testing  in  the  broad  sense  of  the  word 
is  investigated  using  latent  trait  models,  and  the  attributes  are  related  with  the 
domain  knowledge  “plus”. 

(c)  The  above  two  processes  are  repeated,  with  dynamic  interactions  between  the  two. 

Here  by  an  attribute  we  mean  any  behavior  related  with  cognitive  diagnosis,  including  an 
episode ,  a  specific  sequence  of  episodes,  a  behavior  pattern  representing  a  buggy  strategy ,  etc. 
Bacially,  attributes  are  observable. 

Two  samples  of  subjects  are  needed  in  this  approach.  By  Sample  1  we  mean  several  hun¬ 
dred  (or  more)  individuals  representing  the  target  population,  for  whom  a  specified  cognitive 
diagnosis  is  considered.  We  need  this  sample  for  operationally  defining  the  competency  space, 
which  represents  domain  knowledge  “plus” .  The  actual  process  will  be  initiated  by  administer¬ 
ing  the  tests  developed  for  this  purpose.  By  Sample  2  we  mean  a  smaller  group  of  individuals 
sampled  from  the  same  population,  who  conduct  actual,  intensive  tasks  (e.g.,  circuit  design)  in 
the  experimental  situation. 

Sample  2  can  be  a  subgroup  of  Sample  1.  If  this  is  not  the  case,  then  these  individuals  in 
Sample  2  must  take  the  computerized  adaptive  versions  of  the  same  tests  so  that  their  positions 
in  the  competency  space  can  be  estimated. 

[11.2]  Domain  Knowledge  “Plus” 

Domain  knowledge  in  circuit  designing  and  electronic  trouble-shooting  includes  understand¬ 
ings  of  Boolean  algebra,  truth  tables,  Karnaugh  maps  (K-maps),  logical  gates,  their  relationships 
with  each  other,  etc.,  among  others.  They  can  be  approached  by  means  of  tests.  Also  tests 
can  deal  with  complete  design  tasks  for  combinational  circuits,  for  example,  especially  if  we  use 
computerized  tests.  They  are  beyond  the  level  of  domain  knowledge,  and  correspond  to  plus 
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in  the  subtitle. 


[11. 3]  Preliminary  Study:  Two  Quizs 

Two  quizs  have  been  given  to  Vanderbilt  sophomores  of  engineering,  who  were  in  one  of  Dr. 
Bharat  Bhuva’s  courses.  Quiz  1  has  4  questions  and  was  given  to  31  students,  and  Quiz  2  has 
16  questions  and  was  given  to  24  students,  a  subset  of  the  31  who  had  taken  Quiz  1. 

In  contents,  Quiz  1  has  two  categories:  (a)  to  generate  a  truth  table  for  a  specified  gate  (2 
questions),  and  (b)  to  implement  a  specified  2-input  gate  using  a  specified  type  of  gate  only  (2 
questions);  Quiz  2  has  four  categories  to  question  equivalence  of:  (c)  a  K-map  to  a  truth  table 
(4  questions),  (d)  a  K-map  to  a  Boolean  equation  (4  questions),  (e)  a  Boolean  equation  to  a 
circuit  (3  questions),  and  (f)  two  Boolean  expressions  (5  questions).  The  average  proportions 
correct  are:  (a)  0.935,  (b)  0.306,  (c)  0.677,  (d)  0.958,  (e)  0.639  and  (f)  0.833,  respectively.  With 
these  small  samples  of  31  and  24  students,  the  resulting  item  score  matrices  look  as  if  these 
tasks  could  be  interpreted  by  one  dimension,  although  some  deviations  are  suggested  in  (e)  and 
(f)  of  Quiz  2. 

However,  in  operationally  defining  the  competency  space  for  the  electronic  trouble-shooting 
and  circuit  design,  a  larger  dimensionality  is  expected. 

[11. 4]  Different  Types  of  Test  Items 

There  are  different  types  of  test  items,  each  of  them  has  its  own  merits  and  demerits. 

[11.4.1]  PAPER-AND-PENCIL  TESTS  VS.  COMPUTERIZED  TESTS 

Paper-and-pencil  testing  enables  us  to  collect  data  within  a  limited  amount  of  time  for  a 
large  sample  of  examinees,  whereas  computerized  testing  requires  a  greater  amount  of  time 
and  also  costs  more.  Computerized  testing  enables  us  to  conduct  more  intensive  research  in  a 
tractible  environment,  however,  and  to  trace  the  examinee’s  behavior  sequentially  and  in  more 
detail,  in  comparison  with  paper-and-pencil  testing. 

It  will  be  beneficial  for  our  research,  therefore,  to  use  both  and  make  the  best  use  of  their 
separate  strengths.  From  practical  aspects,  a  combination  of  8:2,  7:3  or  6:4  of  the  paper-and- 
pencil  and  computerized  test  items  may  be  desirable. 

[11.4.2]  RECOGNITION  VS.  CONSTRUCTION 

A  type  of  test  item  widely  used  for  recognition  is  the  multiple-choice  test  item,  while  a  widely 
used  type  for  construction ,  which  can  also  be  called  component  design  tasks  in  the  research  of 
circuit  design,  is  the  open-ended  test  item. 

In  general,  construction  tasks  may  be  more  appropriate,  but  recognition  tasks  take  less  time 
for  administration,  and  can  be  used  effectively  for  certain  questions,  especially  if  we  select 
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appropriate  distractors  in  multiple-choice  test  items  and  also  devise  to  discourage  random  or 
educated  guessing.  One  such  device  may  be  to  accomodate  more  than  one  correct  answer 
without  telling  the  examinees  how  many  alternative  answers  are  correct  in  a  specific  item,  with 
such  an  instruction  that:  Find  all  expressions  that  are  equivalent  to  <Given>. 

Effective  use  of  distractors  in  a  multiple-choice  test  item  may  be  exemplified  by  such  a 
question  that:  <Given>  {x  +  y)  as  a  Boolean  expression,  a  set  of  alternative  answers  in 
<Terminal>  includes  x  +  y  ,  a  common  misconception,  or  buggy  DeMorgan ,  in  addition  to 
the  correct  answer,  xy  .  In  this  way,  multiple-choice  test  items  can  be  used  effectively  for 
detecting  the  adoption  of  erroneous  rules.  Plausibility  functions  of  disctactors  of  multiple- 
choice  test  items  (Samejima,  1984,  1994a)  can  be  estimated  using  a  nonparametric  approach. 

[II.4.3]  PROBLEM  COMPLEXITY 

Various  levels  of  problem  complexity  can  be  conceived  of,  including  single-step  problems  as 
well  as  multi-step  problems.  Roughly  speaking,  paper-and-pencil  testing  can  handle  single-step 
problems  and  relatively  simple  multi-step  problems  without  difficulty,  while  with  computerized 
testing  we  can  deal  with  more  complex  multi-step  problems,  tracing  the  examinee’s  cognitive 
processes  and  obtaining  much  more  detailed  information  about  his  cognitive  processes. 

Take  Question  14  of  Quiz  2  as  an  example.  In  the  quiz,  the  examinee  is  asked  to  decide  if 
the  Boolean  equation 


qp  +  qp  +  pq  —  pq 

is  true  or  false.  To  answer  this  question,  perhaps  the  backward  processes  will  be  easier,  which 
includes  DeMorgan’s  law,  tautology,  distributive  law,  commutative  law  and  absorption  as  shown 
below. 


pq=  p  +  q 

=  P{q  +  <i)  +  q{p  +  P) 
=  pq  +  pq  +  qp  +  qp 
=  qp  + qp  +  qp  +  pq 
=  qp  +  qp  +  pq 


DeMorgan' s  law 
tautology 
distributive  law 
commutative  law 
absorption 


This  example  belongs  to  the  category  of  relatively  simple  multi-step  problems.  Note  that  not 
all  steps  are  equally  easy  or  difficult.  In  the  above  example,  the  use  of  tautology  may  be  more 
difficult  than  the  use  of  commutative  law,  for  example. 

[II. 4. 4]  CONVENTIONAL  TESTS  VS.  ADAPTIVE  TESTS 
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Conventional  tests  are  represented  by  paper-and-pencil  tests  and  computerized  tests,  the 
latter  of  which  have  been  rapidly  put  into  practice  in  the  past  decade.  Computerized  adaptive 
tests  have  also  been  materialized  as  advanced  technologies  have  made  them  more  and  more 
feasible.  A  strength  of  adaptive  testing  is  that  only  a  tailored  subset  of  the  total  set  of  items, 
or  itempool ,  is  administered  to  an  individual  subject,  and  yet  the  loss  in  accuracy  of  estimation 
of  his  ability  can  be  small. 

In  the  present  methodology,  first,  conventional  tests  will  be  administered  to  Sample  1  and 
these  results  will  lead  to  the  definition  of  the  competency  space,  including  the  discovery  of 
its  dimensionality,  and  also  to  the  item  calibration  of  the  itempool.  After  these  have  been 
accomplished,  we  will  switch  to  computerized  adaptive  tests,  using  the  results  of  the  item 
calibration.  Each  individual  of  Sample  2  will  take  the  adaptive  tests,  unless  he  has  already 
taken  the  original  conventional  tests. 

A  strength  of  the  computerized  adaptive  test  also  exits  in  item  calibration.  It  has  been 
shown  that  on-line  item  calibration  can  be  conducted  just  as  accurately  as  the  conventional 
item  calibration,  in  spite  of  the  fact  that  the  number  of  test  items  given  to  an  individual 
subject  is  much  less  (e.g.,  15  vs.  50)  than  that  of  the  conventional  test,  or  itempool,  and  also 
the  number  of  examinees  for  the  computerized  adaptive  test  is  much  less  (e.g.,  1,500  vs.  3,000) 
than  that  for  the  conventional  test  (see  Samejima,  1988,  1990). 

[II. 5]  Computerized  Tests  with  <Given>,  <Hint>  and  <Terminal> 

If  we  insert  <Hint>  between  <Given>  and  <Terminal>,  then  <Hint>  will  control  the 
difficulty  of  the  question.  Two  types  of  sequential  presentations  of  <Hint>,  both  starting  with 
no  <Hint>  in  case  the  examinee  will  solve  the  problem  without  depending  upon  any  given 
hints,  are  conceivable.  In  the  example  given  in  [II.4.3],  the  left-hand-side  of  the  first  line  will 
become  <Given>,  and  the  last  line  will  be  <Terminal>,  (or  vice  versa),  and  each  line  from  the 
top  can  be  presented  as  a  hint,  thus  constructing  a  sequence  of  hints. 

Another  method  is  to  start  with  a  more  difficult  hint,  and,  if  the  examinee  fails  in  supple¬ 
menting  the  remaining  processes,  a  more  obvious  hint  will  be  given,  and  so  on.  In  the  previous 
example,  suppose  that  <Hint>  and  <Terminal>  are  reversed.  Then  the  first  step  will  be  to 
add  qp  at  the  beginning  of  the  expression  in  <Given>.  The  first  hint  may  be 

a-\-b  =  a-\-b~\~b 

If  the  examinee  cannot  use  the  hint  for  solving  the  problem,  then  delete  it,  and  present  a 
stronger  hint  such  as 


&  —  a  a  b 

If  this  still  does  not  work,  then  we  may  replace  it  by 
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a  +  b  +  c  =  a  +  a  +  b  +  c  , 


which  is  more  suggestive  than  the  previous  two.  Inclusion  of  items  of  this  type  is  important, 
and  the  graded  response  model  (Samejima,  1969,  1972,  1994c)  is  readily  applicable.  To  a  lesser 
extent,  we  can  also  incorporate  items  of  this  type  in  paper-and-pencil  tests. 

There  can  be  other  paths  to  the  <Terminal>  besides  those  that  go  through  given  <Hint>. 
Does  each  path  test  the  same  rules?  Nonparametric  approaches  for  estimating  the  operating 
characteristics  (Samejima,  1981,  1988,  1990,  1994d)  will  eventually  discover  the  answer  to  this 
question. 

[II. 6]  Diffused  Attributes  and  Concentrated  Attributes 

In  the  nonparametric  approach  for  estimating  the  operating  characteristics  (Samejima,  1981, 
1988,  1990,  1994d),  several  ways  of  approximating  the  conditional  distribution  of  ability,  given 
its  maximum  likelihood  estimate,  have  been  introduced,  using  the  method  of  moment  for  fitting 
a  least  squared  polynomial  to  the  set  of  maximum  likelihood  estimates  of  ability  (cf.  Samejima 
&  Livingston,  1979). 

Suppose  that  the  abscissa  of  each  of  the  two  figures,  which  are  presented  in  Figure  2-1, 
is  a  dimension  of  the  competency  space,  and  the  ordinate  indicates  the  maximum  likelihood 
estimate  of  the  construct  represented  by  the  abscissa.  In  these  figures,  the  estimated  position  of 
each  of  the  eight  subjects  is  shown  by  an  arrow,  and  the  corresponding  approximated  conditional 
density  function  of  the  construct  is  drawn. 

Let  us  assume,  for  example,  there  are  three  disjoint  attributes,  and  they  distribute  among 
the  eight  subjects  in  the  way  that  the  areas  under  the  curves  of  the  conditional  density  functions 
are  shaded  in  the  left-hand-side  figure.  Then  the  proportioned  marginal  density  functions  of 
the  three  attributes  will  be  as  shown  at  the  bottom  of  the  figure.  If  this  is  the  case,  we  shall 
say  that  the  disjoint  attributes  are  concentrated  with  respect  to  the  competency  dimension. 
When  a  single  attribute  has  a  proportioned  marginal  density  function  which  is  similar  to  the 
three  proportioned  marginal  density  functions  in  the  figure,  we  will  also  say  that  the  attribute 
is  concentrated.  Such  a  result  suggests  that  this  competency  dimension  is  closely  related  with 
the  attribute  in  question. 

If,  in  contrast,  the  proportioned  marginal  density  function  of  an  attribute  shapes  flatly 
over  a  wide  range  of  the  dimension,  as  is  illustrated  in  the  right-hand-side  figure,  we  shall  say 
that  the  attribute  is  diffused  with  respect  to  the  competency  dimension.  A  couple  of  possible 
interpretations  of  such  a  result  may  be: 

1.  the  competency  dimension  has  nothing  to  do  with  the  attribute,  and,  if  this  happens 
to  every  dimension,  then  a  larger  dimensionality  of  the  competency  space  will  be 


9 


sed  attribute  (right). 

tau)  is  the  maximum 

1 


needed,  and 

2.  the  attribute  has  multiple  meanings. 

Suppose  that  the  attribute  is  a  pattern  of  behavior.  If  it  turns  out  to  be  concentrated,  then 
we  should  investigate,  by  interviewing  these  subjects,  etc.,  whether  the  attribute  represents 
a  specific  family  of  strategies  which  is  prone  to  be  taken  for  subjects  whose  positions  on  this 
competency  dimension  are  in  the  concentrated  range.  If  this  has  been  confirmed  to  be  true, 
then  in  the  future  we  should  anticipate  that  a  strategy  in  this  family  will  be  taken  with  a 
high  probability  by  individuals  in  this  range  of  the  competency  dimension,  i.e.,  a  finding  of  the 
research.  If  it  turns  out  to  be  diffused  with  every  dimension  of  our  competency  space,  then 
we  should  investigate  if  the  pattern  of  behavior  has  multiple  meanings,  that  is,  if  it  commonly 
belongs  to  separate  families  of  strategies.  If  the  answer  is  negative,  then  we  must  investigate 
the  possibility  of  developing  an  additional  set  of  test  items  to  enhance  the  dimensionality  of 
the  competency  space. 

As  we  increase  the  size  of  Sample  2,  we  shall  be  able  to  estimate  the  operating  characteristic 
of  a  specific  attribute  more  and  more  accurately,  using  a  nonparametric  approach  for  estimating 
the  operating  characteristic  of  any  discrte  response,  such  as  the  Conditional  P.D.F.  Approach, 
which  is  described  and  discussed  in  Section  3. 

[It. 7]  Alternative  Method:  Multi-Stage  Latent  Trait  Approach 

An  alternative  method  for  using  Samples  1  and  2  is  to  combine  them  into  one  sample, 
which  includes  several  hundred  to  one  thousand  individuals.  In  this  method,  first  we  need  to 
develop  software  for,  say,  problem  solving  and/or  designing  tasks,  after  intensive  pilot  studies. 
With  today’s  computer  technologies  and  availabilities,  it  is  possible  to  administer  a  set  of, 
say,  30  items  for  several  hundred  individuals  within  a  couple  of  months,  if  10  to  15  carry-on 
microcomputers  and  the  same  number  of  testers  are  available.  This  sample  size  is  comparable 
to  typical  sample  sizes  when  paper-and-pencil  tests  are  used  in  college  environments. 

Advancement  of  computer  technologies  has  made  it  possible  to  use  figural  responses  in 
computerized  experiments,  by  using  a  mouse.  This  is  especially  beneficial  to  circuit  design 
tasks  on  the  gate  level.  An  advantage  of  responses  using  a  mouse  also  consists  in  the  fact  that 
the  number  of  casual  mistakes  in  responding  will  decrease,  in  comparison  with  responses  using 
the  key  board. 

This  method  includes  technologies,  which  enable  us  to:  : 

1.  control  an  experimental  situation  by  identical  software  accomodated  in  microcomputers, 

2.  have  human  subjects  work  on  problem  solving  or  designing  tasks  presented  on  their 
monitor  screens, 

3.  have  the  microcomputers  record  their  cognitive  processes,  and 
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4.  have  the  computers  analyze  and  evaluate  the  subjects’  performances. 

Thus  methodologies  originated  in  psychometrics  could  be  adopted  in  cognitive  psychology  both 
in  depth  and  perspective,  which  includes  problem  solving,  trouble-shooting,  etc. 

If  a  sufficient  research  fund  is  available,  this  alternative  method  will  be  more  fruitful.  It  is 
possible  that  the  gate  level  trouble-shooting  or  circuit  design  tasks  themselves  can  be  incorpo¬ 
rated  into  the  software,  making  use  of  figural  responses  by  a  mouse.  Thus  dynamic  interactions 
of  microscopic  and  macroscopic  approaches  will  be  realized,  enhancing  the  productivity  of 
research. 

Use  of  nonparametric  approach  (Samejima,  1981,  1988,  1990,  1994d)  for  discovering  the 
meanings  of  patterns  of  behavior  has  an  important  role  in  the  multi-stage  latent  trait  approach. 
This  is  especially  so  in  the  presence  of  multi- strategies  in  problem  solvings,  multi-correct  solu¬ 
tions,  buggy  strategies,  and  other  factors  that  make  understandings  and  evaluations  of  chunks 
of  behavior  difficult.  Samejima  (1984,  1994a)  used  a  two-stage  latent  trait  approach  to  discover 
the  plausibility  function  of  each  distractor  of  each  multiple-choice  item  of  a  vocabulary  test. 
The  multi-stage  latent  trait  approach  is  similar  in  principle  to  this  method. 

In  practice,  in  spite  of  complexities  in  understanding  and  evaluating  chunks  of  behavior,  it 
is  likely  that  the  performance  of  each  individual  can  be  dichotomously  scored  as  solution  and 
nonsolution  with  negligible  ambiguity,  as  is  the  case  with  multiple-choice  test  items.  Thus  on 
the  first  stage  of  the  multi-stage  latent  trait  approach  each  problem  or  designing  will  be  treated 
as  a  dichotomous  item.  Even  if  there  axe  multi-strategies,  if  a  single  correct  answer  exists,  the 
item  should  be  scored  either  0  or  1  ,  ignoring  different  strategies.  If  multi-correct  answers 
exist,  in  general,  1  should  be  given  to  all  correct  answers.  This  is  a  tentative  treatment,  and 
the  separate  operating  characteristics  for  the  separate  multi-correct  answers  will  be  estimated 
later. 

Factor  analysis  will  be  used  to  find  out  the  dimensionality  of  the  latent  space.  If  more 
than  one  dimension  are  found,  it  will  be  wise  to  treat  each  dimension  separately  adopting 
unidimensional  latent  trait  models  as  long  as  a  simple  structure  exists,  instead  of  turning  to 
multidimensional  latent  trait  models. 

Some  appropriate  model  for  the  dichotomous  item,  such  as  the  normal  ogive  model  or  the 
logistic  model,  can  be  adopted  for  each  latent  dimension.  Model  validation  for  the  adopted 
model  will  be  made  on  the  second  stage. 

A  strength  of  the  nonparametric  approach  developed  by  the  principal  investigator,  which  is 
introduced  and  discussed  in  Section  3,  is  that  it  can  be  used  for  relatively  small  sets  of  data 
with,  say,  several  hundred  to  one  thousand  subjects.  It  is  based  on  the  Old  Test,  consisting 
of  items  whose  characteristics  are  known.  In  the  multi-stage  latent  trait  approach  the  set  of 
dichotomously  scored  items  on  the  first  stage  can  be  used  as  the  Old  Test. 

Based  on  this  Old  Test  the  operating  characteristic  of  the  solution  will  be  estimated  for  model 
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validation.  If  the  resulting  curve  of  the  estimated  operating  characteristic  is  close  enough  to 
the  assumed  parametric  operating  characteristic,  then  the  adopted  model  is  validated;  if  not, 
it  is  invalidated  and  the  process  must  be  repeated  by  adopting  another  dichotomous  model. 

After  a  model  has  been  validated,  using  the  nonparametric  approach  the  operating  charac¬ 
teristic  of  each  chunk  of  behavior  observed  in  each  problem  or  designing  can  be  estimated,  and 
discoveries  of  their  meanings  will  follow.  For  multi-correct  solutions  the  operating  characteris¬ 
tics  can  be  estimated  for  separate  correct  solutions,  and  the  results  will  clarify  the  order  of  the 
separate  correct  solutions. 

The  sets  of  resulting  nonparametrically  estimated  operating  characteristics  may  indicate 
which  mathematical  model  is  applicable.  In  multi-correct  solution  and/or  multi-strategy  cases 
a  model  developed  for  such  cases  (e.g.,  Samejima,  1983)  will  be  necessary.  Otherwise,  they  may 
direct  us  to  the  homogeneous  case,  and  one  of  the  models  such  as  the  normal  ogive  model  and 
the  logistic  model  may  be  appropriate;  if  they  direct  us  to  the  heterogeneous  case,  then  adoption 
of  the  acceleration  model  may  be  appropriate.  In  the  latter  case,  a  tentative  parameterization 
of  the  nonparametrically  estimated  cumulative  operating  characteristics,  using  a  very  general 
semiparametric  method  (e.g.,  Ramsay  &  Wong,  1993)  will  be  needed  (see  Samejima,  1994c). 

[II.8]  Grades  of  Attainment 

It  has  been  customary  that  diagnosis  is  made  dichotomously,  that  is,  individuals  are  cate¬ 
gorized  either  in  mastery  or  in  nonmastery  with  respect  to  a  given  attribute ,  as  exemplified  by 
Tatsuoka’s  studies  (Tatsuoka,  1985,  1990).  Since  each  attribute  involved  in  a  task  gets  more 
and  more  complicated  as  mental  processes  get  higher,  however,  mastery  of  an  attribute  requires 
a  sequence  of  subprocesses.  To  give  an  example,  consider  the  attribute,  fraction ,  which  is  used 
by  DiBello,  Stout  and  Roussos  (1993).  They  provide  us  with  several  items  requiring  this  at¬ 
tribute,  and  one  of  them  is  item  3:  “Solve:  7  —  2x  =  9  +  3a;  .”  It  is  noted  that,  in  solving  this 
problem,  we  need  the  understanding  of  the  concept  of  fractions,  but  we  do  not  need  the  mastery 
of  the  fraction,  which  includes  addition,  subtraction,  multiplication  and  division  of  numerical 
and  algebraic  functions  including  fractions. If  we  grade  the  attainment  for  the  fraction  skills  0 

(=  no  understandings),  1  (=  understanding  concept  of  fractions)  and  2  (=  mastery),  instead 
of  0  (=  non-mastery)  and  1  (=  mastery),  then  all  we  will  need  is  grade  1  attainment  in  solving 
the  above  equation. 

Thus  in  order  to  make  an  accurate  cognitive  diagnosis,  introduction  of  the  concept  of  grades 
of  attainment  (Samejima,  1994b)  is  advisable.  This  indicates  that  we  need  to  turn  to  an 
appropriate  graded  response  model,  which  is  discussed  in  Section  4. 

Grades  of  attainment  are  further  discussed,  together  with  comments  on  the  DiBello-Stout  di¬ 
agnosis  model  (DiBello,  Stout  &  Roussos,  1993)  in  Samejima,  1994b,  and  the  reader  is  directed 
to  this  book  chapter. 
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[II.9]  Decomposition  of  the  Competency  Space 

The  competency  space  represents  not  only  domain  knowledge,  but  also  dynamics  of  putting 
mastered  chunks  of  knowledge  in  appropriate  configurations  was  emphasized,  which  includes 
discovery  of  implications  of  what  have  been  learned,  integration  and  restruction  of  the  acquired 
domain  knowledge,  identification  of  necessary  information  in  our  long  term  memory,  creation 
of  new,  innovative  structures  of  the  domain  knowledge,  etc.  This  is  the  reason  why  we  say  that 
the  competency  space  represents  domain  knowledge  plus. 

Let  0  denote  the  total  competency  space,  and  be  decomposed  in  such  a  way  that 

©'  =  [el,  el]  , 

where  0O  represents  masteries  of  attributes  and  0;,  consists  of  dimensions  of  dynamics  which 
are  beyond  mastery  of  attributes  (see  Samejima,  1994b,  1994e).  Thus  diagnosis  will  be  made 
in  each  of  the  two  subspaces. 

In  cognitive  diagnosis,  this  second  subspace  has  rather  been  neglected.  In  some  situations, 
however,  diagnosis  in  0fc  is  more  important,  as  exemplified  by  selection  of  Ph.  D.  candidates. 
There  are  graduate  students  who  can  do  course  work  well,  but  are  poor  in  designing  dissertation 
research,  for  example.  If  diagnosis  in  0j  can  be  done  well  before  decision  of  acceptance  and 
rejection  of  applicants  of  a  graduate  program  is  made,  this  type  of  students  will  be  screened 
and  rejected. 

For  further  details  concerning  this  subject,  the  reader  is  directed  to  Samejima,  1994b. 
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III.  Efficient  Nonparametric  Approaches  for  Estimating  the 
Operating  Characteristics  of  Discrete  Item  Responses 

The  principal  investigator  has  been  engaged  in  developing  a  family  of  approaches  and  meth¬ 
ods  for  estimating  the  operating  characteristic  of  a  discrete  item  response,  or  the  conditional 
probability,  given  latent  trait,  that  the  examinee’s  response  be  that  specific  response  (Samejima, 
1977b,  1981,  1988,  1990b).  These  methods  are  featured  by  the  facts  that: 

1.  estimation  is  made  without  assuming  any  mathematical  forms,  and 

2.  it  is  based  upon  a  relatively  small  sample  of  several  hundred  to  a  few  thousand  exam¬ 
inees. 

In  this  research  period,  rationale  and  the  actual  procedures  of  two  nonparametric  approaches, 
Bivariate  P.D.F.  Approach  and  Conditional  P.D.F.  Approach,  the  latter  of  which  includes 
Simple  Sum  Procedure  and  Differential  Weight  Procedure,  were  integraded  under  the  title, 
Efficient  nonparametric  approaches  for  estimating  the  operating  characteristics  of  discrete  item 
responses.  In  this  paper,  some  examples  of  the  results  obtained  by  the  Simple  Sum  Procedure 
and  the  Differential  Weight  Procedure  of  the  Conditional  P.D.F.  Approach  were  given,  using 
simulated  data,  and  the  usefulness  of  these  nonparametric  methods  was  also  discussed.  The 
paper  was  submitted  to  Psychometrika ,  and  was  accepted  with  minor  modifications  (see  Section 
1).  Since  modifications  have  not  been  made  and  it  will  take  some  time  before  it  is  published 
in  Psychometrika,  in  this  section,  the  outline  of  this  paper  is  presented. 

In  estimating  the  operating  characteristic  of  a  discrete  response,  or  the  conditional  proba¬ 
bility,  given  ability,  with  which  the  discrete  response  occurs,  there  are  two  conceivable  general 
approaches.  One  is  the  parametric  approach,  in  which  a  specific  mathematical  model  is  as¬ 
sumed  so  that  the  estimation  of  the  operating  characteristic  is  reduced  to  the  estimation  of  its 
item  parameters.  The  other  is  the  nonparametric  approach,  in  which  no  mathematical  model 
is  involved,  that  is,  estimation  is  made  without  assuming  any  mathematical  forms  for  the  op¬ 
erating  characteristic.  The  usefulness  of  the  nonparametric  approach  lies  in  the  fact  that  they 
will  allow  researchers  to  venture  in  new  areas  by  discovering  the  true  shapes  of  the  operating 
characteristics  rather  than  a  priori  molding  them  into  specific  mathematical  forms. 

Lord  has  developed  a  nonparametric  method  to  estimate  the  operating  characteristic  and 
applied  it  for  SAT  Verbal  test  items  (Lord,  1970),  and  the  results  led  him  to  conclude  that  Birn- 
baum’s  three- parameter  logistic  model  (Birnbaum,  1968)  fitted  well  to  the  nonparametrically 
estimated  item  characteristic  curves  of  these  items.  Samejima  proposed  Normal  Approximation 
Method  (Samejima,  1977b),  and  then  several  other  nonparametric  methods  (Samejima,  1981, 
1988,  1990b).  Levine  developed  a  nonparametric  method  based  upon  the  multilinear  formula 
scoring  theory  (Levine,  1984).  While  Lord’s  method  is  focused  upon  a  large  set  of  data  such 
as  those  available  at  the  Educational  Testing  Service,  for  example,  Samejima’s  and  Levine’s 
methods  make  use  of  a  relatively  small  set  of  data  collected  for,  say,  several  hundred  to  a  few 
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thousand  examinees.  The  latter  methods  can  effectively  be  used  for  the  on-line  item  calibration 
in  computerized  adaptive  testing  as  well  as  for  the  item  calibration  in  paper-and-pencil  testing. 


[III.l]  Rationale 

Let  0  be  ability ,  or  latent  trait,  which  takes  on  any  real  number.  It  is  assumed  that  there 
is  a  set  of  test  items  measuring  0  whose  characteristics  are  known.  This  set  of  test  items  is 
called  Old  Test.  Let  f(0)  be  the  probability  density  function  of  0  ,  g  denote  a  target  test 
item  for  which  the  operating  characteristics  of  the  discrete  responses  are  to  be  estimated,  Kg 
be  the  discrete  response  to  item  g  ,  and  kg  denote  a  specific  discrete  response. 

The  joint  density  function,  £( kg,0 )  ,  of  the  discrete  item  response  kg  to  the  target  item 
g  and  ability  0  is  expressed  as 


which  leads  to 


{(fcs,0)  =  f(6)  prob.[K„  =  k,  |  e\  , 


/(*)  =  £  «*„,«)  • 

k„ 


Thus  the  operating  characteristic,  Pkg(0)  ?  of  the  discrete  item  response  kg  ,  or  the  conditional 
probability  assigned  to  kg  ,  given  0  ,  is  provided  by 

rk.(B)  =  ProMif„  =  M0]  =  ^  =  tS§J)  ■  ^ 

Suppose  that  r  is  a  one-to-one  mapping  of  0  which  satisfies 

dO 

Tr>0- 

Then  Pkg(0)  can  be  written,  analogously,  as 

=  Pro,lKl  =  |r]  =  ,  (3.2) 

where  /*(r)  and  £*(?,  r)  are  the  density  function  of  the  transformed  ability  r  and  the 
joint  density  function  of  the  discrete  item  response  i  €  Kg  and  r  ,  respectively.  Note  the 
relationships 

nr)  =  m  fT 

and 

Cfe.r)  =  rW  prob.[Ks  =  k,  |  r]  =  f(r)  Pk,(«)  =  ^  , 


which  axe  obtainable  from  the  definition  of  r  ,  and  (3.1)  and  (3.2).  For  simplicity,  hereafter, 
kg  will  be  used  for  both  a  specific  discrete  item  response  to  item  g  ,  and  the  event  Kg  =  kg  . 
Similar  usage  of  symbols  will  be  made  for  certain  other  concepts  and  events. 
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Let  h  (=  1,2, ...,  n)  denote  an  item  of  the  Old  Test,  kk  be  a  discrete  response  to  item  h  , 
and  Pkh{0)  denote  the  operating  characteristic  of  kh  ,  or  the  conditional  probability  assigned 
to  kh  ,  given  6  .  It  is  assumed  that  Pkh( 9)  is  three-times  differentiable  with  respect  to  9  . 

A  response  pattern  based  upon  the  Old  Test,  which  is  denoted  by  V  ,  is  given  by 


v  =  {Khy  , 


and  its  realization,  v  ,  can  be  written  as 

v  =  {khy  . 

Throughout  the  rest  of  this  paper  local  independence  (Lord  &  Novick,  1968)  is  assumed  to  hold, 
so  that  within  any  group  of  examinees  all  characterized  by  the  same  value  of  the  latent  variable 
6  ,  or  its  transformation  r  ,  the  distributions  of  the  discrete  item  responses  are  all  independent 
of  each  other.  Let  <p*(kg,T,v )  denote  the  tri- variate  density  function  of  kg  ,  r  and  v  .  Thus 
(3.2)  can  be  rewritten  as 


Pk.{0)  = 


Ev  <P*(kg,T,v) 
T.i€Kg  Et,  ¥>•(*>,  v) 


(3.3) 


There  are  many  variations  of  the  expression  of  the  right-hand-side  of  (3.3),  and  one  of  them  is 


Ey  1  kg,v)  prob.[v  fl  kg 
E„  I  v)  prob.(v) 

Ev  g*(r  1  kg,v)  prob.[vnkg\ 
E igtf.Ev  prob.[v  n  i] 


(3.4) 


where  </>*(r  |  v )  is  the  conditional  density  function  of  r  ,  given  v  ,  and  <j*(r  |  kg,v )  is  the 
conditional  density  function  of  r  ,  given  kg  and  v  ,  which  are  provided  by 


«£*(r  |  v)  = 


/*(r )  prob.[v  |  r 
prob.(v) 


(3.5) 


and 


<'*(T  I  kg,v) 


(*{kg,T)  prob.[v  |  t 
prob.[v  fl  kg] 


(3.6) 


respectively. 

Note  that  the  joint  density  function,  £*(fcs,r),of  kg  and  r  can  be  written,  from  (3.6),  as 


C(kg,r) 


Ev  S*{t  |  kg,v)  prob\v  D  kg] 
prob\v  |  r] 
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and  then  from  this  and  (3.5)  the  density  function  of  r  is  given  by 


Ey  <f>*{r  I  v)  prob.jv ) 
E„  prob.[v  |  r] 

E.gA:,  E„  pro6.[u  n »] 

£„pro6.[v  |  r] 


[III.2]  Bivariate  P.D.F.  Approach 

Direct  approach  to  (3.3)  or  (3.4)  is  extremely  difficult,  for  in  so  doing  good  estimates  of 
r,  v)  or  those  of  for  all  combinations  of  i  6  Kg  and  v  ,  in  addition  to  the  set  of 

Pkh{Q)  ’s  for  the  n  Old  Test  items.  Some  indirect  approach  is  needed,  therefore,  to  make  use 
of  (3.3)  or  (3.4). 

The  method  called  Bivariate  P.D.F.  Approach  (Samejima,  1981)  is  an  indirect  approach 
based  on  (3.3),  and  in  which  p.d.f.  stands  for  the  probability  density  function  of  r  and  its 
maximum  likelihood  estimate  f  obtained  from  the  responses  to  the  Old  Test  items.  In  this 
approach,  the  estimator  of  the  operating  characteristic  is  defined  by 


Ef  y(fcj,T,f) 
E,€K9Ef  0(*>,t) 


(3.7) 


where  <p(kg,T,r)  is  the  tri- variate  density  function  of  kg  ,  r  and  f,  <p(kg,T,  t)  indicates  the 
estimate  of  <p(kg,  r,  f)  and  means  the  summation  over  all  equally  spaced  values  of  t  for 
which  not  all  estimated  bivariate  densities,  <^(^,r,f)  ,  are  practically  nil.  It  is  noted  that,  in 
(3.7),  f  replaces  the  response  pattern  v  in  (3.3)  and  is  treated  as  a  continuous  variable,  and 
the  ratio  in  the  right-hand-side  of  (3.7)  approximates  the  ratio  of  the  integration  of  <p(kg,  r,  f) 
with  respect  to  f  and  the  sum  total  of  the  integration  of  ip(i,  r,  f)  over  all  i  &  Kg  .  The 
question  is  how  to  estimate  <p(i,  r,  f)  for  all  i  £  Kg  .  To  make  it  possible  we  need  a  specific 
transformation  of  Ho  r  ,  which  makes  use  of  the  test  information  function  of  the  Old  Test, 
and  allows  us  to  enjoy  the  benefit  of  mathematical  simplicity. 

The  item  response  information  function  (Samejima,  1969,  1972)  is  defined  by 


hh(°) 


P  m  =  r fePkh(Q)Y 

de 2  s  feh( }  1  pkh(0)  J 


(3.8) 


and  the  item  information  function  is  defined  as  the  conditional  expectation  of  hh(&)  given 
6  ,  so  that 
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(3.9) 


m  =  mj<>)  i«]  =E 

kh  kh  ) 

Note  that  this  item  information  function  includes  Birnbaum’s  item  information  function  for 
the  dichotomous  test  item  (Birnbaum,  1968)  as  a  special  case.  The  operating  characteristic, 
Pv(0)  ,  of  the  response  pattern  v  is  defined  as  the  conditional  probability  of  v  ,  given  0  .  Thus 
the  operating  characteristic  of  a  given  response  pattern  becomes  the  product  of  the  operating 
characteristics  of  the  item  response  categories  contained  in  that  response  pattern,  so  that 

pm  =  n  PkM  .  (3.10) 

kh€v 

by  virtue  of  local  independence.  The  response  pattern  information  function ,  Iv(0)  ,  (Samejima, 
1972)  is  given  by 

IM  =  -j^iog  />„(«)  =EM«)  ,  (3.n) 

and  the  test  information  function,  1(0)  ,  is  defined  as  the  conditional  expectation  of  Iv(0)  , 
given  0  ,  and  from  (3.8),  (3.9),  (3.10)  and  (3.11) 

m  =  E[lv(9)  |  ='Z'MPM  =Eaw  ■ 

V  h= 1 

The  transformation  of  0  to  t  is  given  by 

r  =  2-  f  [/(()]>!Vl  +  Co  ,  (3.12) 

where  Co  is  an  arbitrary  constant  for  adjusting  the  origin  of  r  ,  and  C\  is  an  arbitrary  con¬ 
stant  which  equals  the  square  root  of  the  test  information  functions,  /*(r)  ,  of  r  was  adopted. 
This  transformation  will  be  simplified  and  will  become  more  manageable  if  a  polynomial  ap¬ 
proximation  to  the  square  root  of  the  test  information  function,  [Z(^)]1^2  ,  for  the  meaningful 
interval  of  6  is  used,  following  the  least  squared  errors  principle.  This  can  be  accomplished  by 
using  the  method  of  moments  (see  Samejima  &  Livingston,  1979).  Thus  (3.12)  can  be  changed 
to  the  form 


T 


1  A  Qfc 
Cl  j“o  +  1 


0k+l  +  Co 


m+l 


k= 0 


* 


where  a*  (k  =  0, 1, . . . ,  m)  is  the  A:-th  coefficient  of  the  polynomial  of  degree  m  approximating 
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the  square  root  of  1(9)  ,  and  ak  is  the  k-th  coefficient  of  the  polynomial  of  degree  (m+1) 
transforming  0  to  r  ,  which  is  given  by 


ak 


I 


=  CQ 

_ pfc-i 

~  Cife 


k  =  0 
k  =  1,2, 


m  +  1  . 


Adopting  the  maximum  likelihood  estimator,  r„  ,  as  our  estimator,  its  asymptotic  normality 
with  the  two  parameters,  r  and  (1  /C\)  ,  is  used  as  the  approximation  to  the  conditional 
distribution  of  t  ,  given  r  (Samejima,  1977a).  Note  that  the  second  parameter,  which  equals 
the  standard  deviation  of  the  conditional  distribution,  is  constant  for  all  r  .  By  virtue  of  this 
constancy  the  first  through  fourth  conditional  moments  of  r  ,  given  t  ,  can  be  obtained  from 
the  density  function,  g(t)  ,  of  t  and  the  constant  C\  ,  by  the  following  four  formulas. 


E(r  | 

(3.13) 

Var.(r  |  f) 

=  h  {1  +  ^^l0g9(f)}  ' 

(3.14) 

E\{r  -  E( r  | 

(3.15) 

E[{r-E(r\r)Y\t}  = 

C?  {3  +  J?  loM(f)1  +  it  ^  lo*9<f)l2 

l|llogs(^),}  ' 

(3.16) 

The  first  of  these  formulas  is  commonly  seen  in  convolution  transform  (e.g.,  Hirschman  & 
Widder),  and  used  in  statistical  astronomy  (Trumpler  h  Weaver,  1953),  for  example. 

The  marginal  density  function,  g(r)  ,  is  not  directly  observable,  but  can  be  estimated 
from  the  set  of  the  maximum  likelihood  estimates,  rs ’s,  of  the  individual  parameters  t,  ’s, 
where  s  (=  1,2,  ...,1V)  denotes  an  individual  examinee  in  our  sample.  This  can  be  done  by 
fitting  a  polynomial,  following  the  least  squared  errors  principle,  using  the  method  of  moments 
(Samejima  &  Livingston,  1979). 

The  conditional  density  4>kg(r  |  t)  can  be  estimated  through  these  estimates  of  the  condi¬ 
tional  moments,  by  replacing  g(r)  by  gkg( f)  ,  marginal  density  function  for  the  subpopulation 
of  examinees  who  share  a  specific  response  kg  to  the  target  item  g  ,  in  (3.13)  through  (3.16). 

It  will  be  appropriate  to  have  the  estimated  conditional  moments  select  a  functional  formula 
f°r  </>k 9(t  |  t)  for  each  equally  spaced  f  .  One  of  the  Pearson  System  distributions  (e.g., 
Elderton  and  Johnson,  1969)  will  be  selected  for  each  of  these  conditional  distributions,  using 
the  two  coefficients,  fli  and  /?2  ,  and  Pearson’s  criterion  k  ,  which  can  be  written  as 


_  _ +  3)2 _ 

*  “  4(2/32  —  3/?!  —  6)(4/32  —  3/?i)  ' 

In  these  three  formulas,  Var.(r  |  t)  ,  E[{r  —  E(t  |  f)}3  |  f]  and  E[{t  —  E(r  \  f)}4  |  f] 
are  substituted  for  g2  ,  f*3  and  /x4  ,  respectively,  which  are  obtained  by  formulas  (3.14), 
(3.15)  and  (3.16),  for  the  values  of  f  which  are  appropriately  selected  with  reasonably  small, 
equal  steps.  If  fli  and  /32  turned  out  to  be  close  to  0  and  3,  respectively,  then  a  normal 
density  function  may  be  used  as  the  approximation  to  <j>ka{T  |  Ta)  •  Otherwise,  the  criterion 
k  will  lead  to  one  of  the  Pearson  System  distributions,  that  is,  if  k  <  0  ,  then  Pearson’s 
Type  1  distribution,  which  means  the  asymmetric  ^-distribution,  will  be  selected,  if  k  =  0  , 
/?i  —  0  and  /?2  <  3  ,  then  Type  2  distribution,  which  is  the  symmetric  /3-distribution,  will 
be  assigned,  if  0  <  k  <  1  ,  then  Type  4  distribution,  if  k  >  1  ,  then  Type  6  distribution, 
etc.  Multiplying  each  Pearson  System  density  function  thus  obtained  by  the  estimated  joint 
density  function  ^  (f)  (Nkg/N)  ,  where  N  is  the  total  number  of  examinees  and  Nkg  is  the 
number  of  examinees  who  share  the  same  response  kg  to  the  target  item  g  ,  <p(kg,T,  f)  will 
be  obtained. 

It  is  noted  that  in  estimating  <p(kg,T,  f)  the  Pearson  System  distributions  are  used,  which 
are  parametric.  In  this  sense,  the  approaches  that  are  introduced  in  the  present  paper  are  not 
strictly  nonparametric,  and  the  estimated  conditional  mements  introduced  earlier  are  used  as 
the  estimated  parameters.  In  fact,  if  the  interval  (—3.55,3.55)  is  used  with  the  step  width 
0.1  for  f  for  a  five-category  response  item,  for  example,  then  we  are  using  as  many  as  1,420 
estimated  parameters  for  the  single  item. 

Since  this  has  to  be  done  individually  for  each  item  g  and  for  each  and  every  discrete 
response  kg  ,  and  the  process  must  be  repeated  as  many  times  as  the  number  of  discrete  item 
response  categories  for  each  and  every  item,  it  requires  a  substantial  amount  of  CPU  time. 
This  is  a  drawback  of  this  approach. 

[III.3]  Conditional  P.D.F.  Approach 

In  the  Conditional  P.D.F.  Approach,  several  different  procedures,  which  include  Simple 
Sum  Procedure,  Weighted  Sum  Procedure,  Proportioned  Sum  Procedure,  Differential  Weight 
Procedure,  etc.,  have  been  considered  (see  Samejima,  1981).  In  this  section,  Simple  Sum 
Approach  and  Differential  Weight  Approach  will  be  introduced. 


[III.3.1]  SIMPLE  SUM  APPROACH 

An  estimator  of  Pkg(Q)  is  defined  as 


(3.17) 


Km 


N  Ejg kg  $  (r  1  ^») 

jjZs=i  |  W.) 

E.6fc,  nr  I  v.) 

EiLi  4>*{t  I  v,) 


This  estimator  does  not  require  /*(r)  nor  prob.[kg  \  r]  ,  so  it  will  be  easy  to  use  in  practical 
situations.  It  is  not  a  consistent  estimator,  however. 

Proof:  Consistency  of  the  denominator: 

Let  Nv  denote  the  number  of  examinees  in  our  sample  who  share  the  same,  specific  response 
pattern  v  on  the  Old  Test.  Then 


N 


5=1 


v.)  =  4  £  £  ^*(r  K)  =  £  %  ^*(r  I u) 

*  "  *1  II  *1  ■*  * 


— ►  y  <f>*(r  |  u)  prob.(v) 

V 


Thus  it  has  been  demonstrated  that  the  denominator  is  consistent. 

Proof:  Inconsistency  of  the  numerator: 

Let  Nvnkg  be  the  number  of  examinees  who  share  the  same,  specific  response  pattern  v 
and  the  same,  specific  discrete  response  kg  to  the  target  item  g  .  From  (3.5)  and  (3.6) 


<£*(t  |  v )  prob.[kg  |  t 
prob.[kg  |  u] 


(3.18) 


From  this 


£  4>*{r  K)  =  -^  £  £  4>*{t  I  w.)  =  £  ^r2-  ^*(r  I v) 


s€kg 


N 


«  s€kg:va=v  V 

T,  4>*{t  |  u)  prob.(v  fl  kg) 


N 


(3.19) 


=  EC(r|t„0)Pr<*.[.ni,]^Md 


Thus  the  numerator  is  not  consistent  because  of  the  nuisance  factor  shown  as  the  ratio  at  the 
end  of  the  last  expression  of  (3.19). 

Although  this  estimator  is  inconsistent,  direct  approach  to  (3.17)  is  possible,  for  it  simply 
requires  the  set  of  Pkh{0)  ’s  for  the  n  Old  Test  items.  It  has  been  named  full  information 
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simple  sum  formula,  and  tested,  by  Levine  and  Williams  (Levine  &  Williams,  1991).  The 
results  show  pretty  good  fits  to  the  true  operating  characteristics,  especially  with  large  sample 
sizes. 

In  the  Simple  Sum  Procedure  of  the  Conditional  P.D.F.  Approach  (Samejima,  1981,  1988, 
1990b) 


Hr  I  T.) 
i  Hr  I  f.) 


(3.20) 


is  adopted  as  our  estimator  of  Pkg(fi)  »  which  is  almost  identical  with  (3.17)  except  for  the 
replacement  of  va  by  the  maximum  likelihood  estimate  t,  of  the  individual  examinee  s  .  Note 
that  t,  is  a  function  of  va  ,  but  a  one-to-one  correspondence  between  t,  and  va  may  not  exist. 
If,  for  example,  two  Old  Test  items  are  equivalent ,  indicating  that  they  share  an  identical  set  of 
operating  characteristics,  then  two  response  patterns,  in  which  the  discrete  responses  to  these 
two  items  are  exchanged  and  the  responses  to  all  the  other  items  are  identical,  will  be  distinct 
from  each  other,  but  will  share  the  same  f,  .  The  lack  of  a  one-to-one  mapping  between  va 
and  Ta  will  not  affect  the  characteristics  of  the  estimator  (3.17)  by  this  replacement,  however. 
The  estimated  conditional  density,  Hr  |  t3)  ,  in  (3.20)  can  be  specified  by  using  the  estimated 
conditional  moments  of  r  ,  given  Ta  ,  which  are  given  by  (3.13)  through  (3.16),  in  a  similar 
manner  as  in  the  Bivariate  P.D.F.  Approach,  by  substituting  ra  for  the  equally  spaced  f  in 
the  Bivariate  P.D.F.  Approach.  Note  that  this  formula  enables  us  to  estimate  all  the  operating 
characteristics  of  the  discrete  item  responses  for  many  different  items  almost  simultaneously, 
which  provides  us  with  the  benefit  of  economy  in  CPU  time. 

From  both  theory  and  practice,  with  many  sets  of  data  very  high  frequencies  for  the  normal 
density  function  as  approximations  to  HT  I  f«)  are  expected  as  the  results  of  the  above 
branching.  When  this  is  the  case,  Normal  Approach  Method  (Samejima,  1981,  1988,  1990b), 
in  which 


Ht  I  *«) 


_ 1 _  fjr  —  E{t  |  f,))2) 

[27 r  Var.{r  |  r,)]1/2  P  \  2  Var.(r  |  fs)  J 


is  adopted  as  the  approximation  to  HT  |  rs)  ,  can  be  used.  It  can  easily  be  seen  that  when 
/*(r)  is  normal  this  is  approximately  the  case  for  all  fa  ,  and  when  it  is  uniform  this  is 
approximately  the  case  for  a  wide  range  of  f  . 

This  somewhat  indirect  simple  sum  approach  includes  several  smoothing  devices  in  the  pro¬ 
cess  by  using  polynomials  obtained  by  the  method  of  moments,  etc.  This  makes  the  resulting 
estimated  curves  smoother  than  those  obtained  by  the  direct  approach,  a  convenient  feature 
when  our  sample  size  is  relatively  small.  It  has  been  shown  that  with  simulated  data  this 
method  provides  us  with  the  estimated  operating  characteristics  which  are  very  close  to  the 
truth  curves  (see  Samejima,  1981,  1988). 
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The  last  factor  in  (3.19),  the  ratio  of  two  conditional  probabilities,  enables  us  to  make 
interesting  observations.  If  our  Old  Test  gives  us  a  substantially  large  amount  of  information 
for  the  range  of  r  of  interest,  then  the  conditional  distribution  of  r  ,  given  v  ,  becomes 
closer  to  a  one  point  distribution  at  a  scpecific  value  of  r  ,  and  the  above  ratio,  or  weighting 
factor,  will  become  closer  to  unity  at  that  value  of  t  .  In  such  a  case,  consistency  almost  exists 
in  the  numerator  of  (3.17)  and  thus  for  the  estimator  itself.  This  means  that  the  success  of 
this  method  depends  upon  our  choice  of  the  Old  Test,  that  is,  whether  the  Old  Test  items 
satisfying  the  above  condition  is  selected  or  not.  When  this  is  not  the  case,  however,  the 
estimated  operating  characteristics  will  have  specific  tendencies.  If,  for  example,  the  truth 
curve  is  a  steep,  monotonically  increasing  one,  then  a  substantially  flatter  estimated  operating 
characteristic  will  be  obtained,  for  the  nuisance  factor,  that  is,  the  ratio  of  prob.[kg  |  fa]  to 
prob.[kg  |  r]  ,  will  act  as  a  smoothing  factor. 

It  has  been  observed  that  Simple  Sum  Procedure  combined  with  the  Normal  Approach 
Method  works  well  especially  in  the  on-line  item  calibration  of  the  adaptive  testing  (see  Same- 
jima,  1981,  1988,  1990b).  The  reason  is  obvious  from  the  above  observation,  since  in  the 
response  pattern 

v*  ~  (u3,  kg} 

where  va  is  based  upon  a  subtest  of  the  itempool  tailored  for  each  individual  examinee,  and 
the  conditional  distribution  of  r  ,  given  v„  ,  becomes  closer  to  the  one-point  distribution  at 
the  true  individual  parameter,  t,  . 


[III.3.2]  DIFFERENTIAL  WEIGHT  APPROACH 


When  there  already  is  a  reasonably  good  estimate  of  Pkg(0)  for  each  kg  ,  another  approach, 
which  in  theory  provides  us  with  more  accurate  estimation,  is  possible.  This  approach  is  called 
Differential  Weight  Approach. 

The  differential  weight  function,  Wkg(r;  v)  ,  is  defined  by 


Then  from  this  and  (3.18) 


Wkg(r-,v)  = 


prob\kg  |  r] 
prob.[kg  |  u] 


I  kg,v)  =  <j>*{r  |  v)  Wkg{r-,v)  . 


(3.21) 


(3.22) 


.Substituting  (3.22)  into  (3.4),  Pkg{0)  can  be  written  as 


W)  = 


Wfcg(r; v)  <t>*(r  I  v)  prob.[v  n  kg] 
£fc9E„  Wkg(r;v)  <f>*(r  \  v)  prob.[v  fl  kg] 


As  before,  let  s  (=  1,2, ...,  N )  be  a  subject  or  an  examinee  in  our  sample.  Define  a  consistent 
estimator  of  Pkg{0)  by 
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(3.23) 


P  (e)  =  ^E,6.9  Wkg(r-va)r(T\vs) 

ks{  ’  £E*9E,e*9  Wka(T-,va)  <f>*(T  K) 

E46*9  wkg(r-,va)  P(T  K) 

E*9  Ea€jt9  W*9(r;t;4)^(r|<) 


Proof:  Consistency  of  the  numerator: 
JfH  Wkg{r\v,)  <F{t  |v4) 

8^kg 


TfE  E  ^.(’■i«,)r(r|v,)  (3.24) 

v  a£k  g:v,=v 

v 

- *  Y1  Wk,{T\  v)  4>*(t  I  v)  prob.[v  n  kg)  . 

V 


Thus  it  has  been  demonstrated  that  the  numerator  of  (3.23)  is  consistent. 

Proof:  Consistency  of  the  denominator: 

From  (3.24),  straightforwardly 

TT  J2  Wfc,(T;  v>)  <^(T  K)  — ♦  I ZY1  Wka(T'>  u)  <£‘(T  I  u)  prob.[v  n  kg) 
kg  a£kg  ka  v 

Therefore, 


Pkg{0)  —  Pkg(e)  . 

Direct  approach  is  possible  if  there  already  exists  a  reasonably  good  estimate  of  prob.[kg  |  r] 
or  Pka{6)  itself,  in  addition  to  the  set  of  Pkh(0)  ’s  for  the  n  Old  Test  items.  This  can 
be  accomplished  by  using  the  estimated  Pkg(0)  obtained  by  the  full  information  simple  sum 
formula. 

Differential  Weight  Procedure  of  the  Conditional  P.D.F.  Approach  (Samejima,  1990a,  1990b) 
can  be  executed  as  a  supplementary  process  of  the  Simple  Sum  Procedure  combined  with,  say, 
the  Normal  Approach  Method.  For  the  estimator  of  P*s(0)  , 


Pkg(0Y 


E ,€kgWk3(T;Ts)  <^(r  |  f4 ) 
EfLi  Wkg{T-,T3)  *(r|f.) 


(3.25) 


where  14^^  (t;  Ta)  denotes  the  estimate  of  the  differential  weight  function  given  by  (3.21)  by 
replacing  v  by  f4  .  Note  that  this  formula  includes  <^(r  |  f4)  ,  but  not  £(r  |  kg,r3 )  .  Since 
4>{t  |  fs)  has  already  been  obtained  in  the  Simple  Sum  Procedure,  all  needed  is  to  substitute 
the  estimated  operating  charcteristic  of  kg  obtained  by  the  Simple  Sum  Procedure  into  the 
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specification  of  the  estimate  of  the  differential  weight  function  Wkg(r;  t„)  .  In  so  doing,  it  will 
be  advisable  to  modify  the  estimated  Pkg(0)  obtained  by  the  Simple  Sum  Procedure  before 
using  it  in  Wkg(r\  ts)  ,  if  Old  Test  has  a  range  of  0  where  the  test  information  function  1(0) 
assumes  low  values.  In  many  cases  this  happens  on  very  high  levels,  or  on  very  low  levels,  of 
6  ,  or  both,  relative  to  the  ability  distribution  of  our  sample.  In  such  a  case,  modifications  can 
be  made  by  extrapolating  the  portion  of  the  estimated  curve  obtained  in  the  interval  of  0 
where  the  amount  of  test  information  provided  by  the  Old  Test  is  sufficiently  large  to  the  range 
of  0  where  this  is  not  the  case.  Using  the  estimated  P*9(0)  thus  modified  in  Wkg(r;  t„)  ,  the 
reestimated  operating  characteristic  will  be  obtained  by  (3.25). 

To  demonstrate  how  to  use  Simple  Sum  and  Differential  Weight  Procedures  of  the  Condi¬ 
tional  P.D.F.  Approach  and  to  observe  the  results,  part  of  a  simulation  study  was  introduced 
in  the  paper.  The  data  were  simulated  data,  provided  by  the  Office  of  Naval  Research  as  the 
initial  itempool  for  the  on-line  item  calibration  research.  There  are  one  hundred  hypothetical 
dichotomous  test  items  in  the  itempool  which  were  administered  in  the  forms  of  conventional  or 
non-adaptive  tests.  None  of  these  one  hundred  items  follow  any  specific  mathematical  models, 
and  some  of  their  item  characteristic  curves,  or  operating  characteristics  of  the  correct  answer, 
are  monotone  increasing,  but  some  others  are  not. 

These  one  hundred  items  were  divided  into  four  subtests  of  twenty-five  items  each,  which 
are  called  Subtests  A,  B,  C  and  D.  These  subtests  are  combined  into  six  pairs,  that  is,  AB,  AC, 
AD,  BC,  BD  and  CD  of  fifty  items  each.  Six  thousand  hypothetical  examinees  were  sampled 
from  a  population  whose  ability  distribution  is  close  to,  but  not  quite  equal  to,  N( 0, 1)  .  One 
thousand  hypothetical  examinees  were  assigned  to  each  of  the  six  pairs  of  sub  tests.  Thus  each 
of  the  one  hundred  test  items  was  administered  to  three  thousand  hypothetical  examinees,  and 
there  were  one  thousand  examinees  who  tried  each  pair  of  test  items.  The  response  pattern  of 
each  examinee  was  produced  by  the  Monte  Carlo  method. 

As  for  the  details  of  the  methods  and  the  results  of  the  comparisons,  the  reader  is  directed 
to  Samejima,  1990a,  or  to  the  paper  accepted  by  Psychometrika  (see  Section  1)  when  it  is 
published. 
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IV.  Acceleration  Model 

The  competency  space  approach  to  cognitive  assessment  eventually  requires  a  family  of 
mathematical  models  which  is  appropriate  for  modeling  cognitive  processes,  and  is  robust 
enough  to  continue  to  be  useful  as  research  goes  further  in  depth  and  precision.  To  answer 
this  necessity,  a  family  of  models,  called  acceleration  model ,  has  been  proposed  and  discussed 
during  this  research  period. 

[IV.l]  Processing  Functions 

Suppose  that  a  cognitive  process,  like  problem  solving ,  contains  a  finite  or  enumerable  number 
of  steps.  The  graded  item  score  xg  (=  0, 1, ...,  mg)  to  (problem  solving)  item  g  is  assigned 
to  the  individuals  who  have  successfully  completed  up  to  the  step  xg  but  failed  to  complete 
the  step  (xg  +  1)  . 

The  processing  function ,  Mx  (0 )  ,  is  defined  as  the  joint  conditional  probability  with  which 
the  individual  of  latent  trait  level  0  completes  the  step  xg  successfully,  under  the  conditions 
that: 


1.  the  individual’s  ability  level  is  0  ,  and 

2.  the  steps  up  to  (xg  —  1)  have  already  been  completed  successfully. 


It  is  assumed  that  MXg(0)  is  non- decreasing  in  0  ,  and 


MXg(0) 


i 


for  xg  =  0  (no  step  yet,  or  starting  point) 

for  xg  =  mg  +  1  (cannot  be  attained)  , 


for  all  0  ,  where  (mg  +  1)  is  the  hypothesized  graded  item  score  adjacent  to  and  above  mg  . 
The  fundamental  theoretical  framework  (Samejima,  1972)  is  given  by 


p..v)  =  n  m.(«)  [i  -  m,„+1)(«)]  , 

U<Xg 


where  PXg{0)  is  the  operating  characteristic  of  the  item  score  xg  ,  that  is, 


P,,W  =  prob.[Xg  =  x,  |  0\  . 


(4.1) 


The  cumulative  operating  characteristic ,  PXg(0)  ,  is  the  conditional  probability  with  which  the 
individual  of  latent  trait  0  completes  the  cognitive  process  successfully  up  to  the  step  xg  ,  or 
further,  so  that  it  can  also  be  expressed  in  terms  of  processing  functions  by 

=  n  .  (4-2) 

U  <Xg 

and  from  (4.1)  and  (4.2) 

p.,m  =  p;,(n  -  p^w  ■ 
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[IV. 2]  Criteria  for  Evaluating  Graded  Response  or  Partial  Credit  Models 

Curve  fitting  and  mathematical  modeling  are  two  different  things.  Even  if  a  model  fits 
data  well,  it  cannot  be  an  ultimate  reason  for  accepting  the  model.  Instead,  the  following  five 
features  have  been  considered  as  criteria  for  evaluating  models. 

1.  The  principle  behind  the  model  and  the  set  of  accompanied  assumptions  agree  with 
the  psychological  reality  in  question.  This  is  by  far  the  most  important  criterion. 

2.  The  model  provides  additivity  in  the  operating  characteristics  of  the  item  scores  or 
degrees  of  attainment.  Additivity  holds  if  the  operating  characteristics  belong  to 
the  same  mathematical  model  under  finer  recategorizations  and  combinings  of  two  or 
more  categories  together.  This  is  the  second  most  important  criterion.  Note  that 
graded  item  scores  or  partial  credits  are  more  or  less  arbitrary,  that  is,  it  is  a  common 
practice  to  change  the  grades  A,  B,  C,  D,  F  to  Pass,  Fail,  for  example.  Also,  with  the 
advancement  of  computer  technologies,  it  is  quite  possible  to  obtain  more  abundant 
information  from  the  individual’s  performance  in  computerized  experiments  as  research 
is  proceeded,  and  thus  we  need  finer  recategorizations  of  the  whole  cognitive  process. 

3.  The  model  can  be  naturally  generalized  to  a  continuous  response  model.  This  criterion 
is  a  natural  extension  of  additivity. 

4.  The  model  satisfies  the  unique  maximum  condition  (Samejima,  1969,  1972).  Satis¬ 
faction  of  this  condition  assures  that  the  likelihood  function  of  any  response  pattern 
consisting  of  such  response  categories  has  a  unique  local  or  terminal  maximum. 

5.  The  model  provides  the  ordered  modal  points  of  the  operating  characteristics  in  ac¬ 
cordance  with  the  item  scores. 

Samejima  (1972)  distinguished  the  homogeneous  case  and  the  heterogeneous  case  of  the 
graded  response  model.  By  the  homogeneous  case  we  mean  a  family  of  models  in  which  the 
cumulative  operating  characteristics,  Pfg(0)  ’s  ,  for  xg  =  1,2 are  identical  in  shape, 
and  these  mg  functions  are  positioned  alongside  the  abscissa  in  accordance  with  the  item  score 
xg  ,  whereas  in  the  heterogeneous  case  not  all  P"  ( 0 )  ’s  are  identical  in  shape  (see  Samejima, 
1972). 

It  has  been  observed  that  models  in  the  homogeneous  case  tend  to  satisfy  the  above  criteria 
to  a  greater  extent,  whereas  for  those  in  the  heterogeneous  case  fulfillment  of  these  criteria  is 
more  difficult.  For  a  model  in  the  homogeneous  case,  if  the  principle  behind  the  model  and  the 
set  of  accompanied  assumptions  are  acceptable  for  the  psychological  reality  in  question,  and  if 
it  satisfies  the  unique  maximum  condition,  then  it  can  be  said  to  be  an  appropriate  model  for 
the  following  reasons. 

1.  Additivity  of  the  operating  characteristics  always  holds. 
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2.  The  model  is  naturally  expanded  to  a  continuous  response  model. 

3.  If  the  model  satisfies  the  unique  maximum  condition,  then: 

(a)  A  strict  orderliness  among  the  modal  points  of  PXg(0)  ’s  holds. 

(b)  Satisfaction  of  the  unique  maximum  condition  (e.g.,  in  the  normal  ogive  and  logistic 
models)  also  holds  for  combined  categories  and  more  finely  classified  categories. 

(c)  Satisfaction  of  the  unique  maximum  condition  also  holds  for  the  generalized  con¬ 
tinuous  response  model  (Samejima,  1973). 

For  a  model  in  the  heterogeneous  case,  the  same  is  not  true,  as  is  exemplified  later.  In 
spite  of  this  handicap,  models  in  the  heterogeneous  case  tend  to  possess  a  greater  variety  in 
shapes  of  the  operating  characteristics  PXg(0)  ’s  .  Thus  search  for  a  family  of  models  in  the 
heterogeneous  case  which  satisfies  the  above  five  criteria  just  as  well  as  those  models  in  the 
homogeneous  case  is  desirable. 

[IV.3]  General  Acceleration  Model  and  a  Specific  Model  in  Which  the  Logistic 
Function  is  Used 

The  acceleration  model  has  been  proposed  as  a  model  in  the  heterogeneous  case  developed 
with  these  considerations  in  mind.  The  processing  function  in  the  acceleration  model  is  given 
by 

m,,w  =  • 

where  (Xg  (>  0)  is  the  step  acceleration  parameter,  1$!Xg(0)  is  a  strictly  increasing,  five 
times  differentiable  function  of  0  with  zero  and  unity  as  its  two  asymptotes,  and  provides  the 
conditional  ratio, 

*«,(»)  m  *».(«) 

(&  ’ 

given  0  ,  which  decreases  with  0  .  In  this  model,  the  value  of  0  at  which  the  discrimination 
power  of  MXg(0)  is  maximal  increases  with  £Xg  .  It  is  assumed  that  the  whole  process  leading 
to  the  solution  of  the  problem  consists  of  a  finite  number  of  clusters,  each  containing  one  or 
more  steps,  and  within  each  cluster  the  parameters  in  }tXg(0)  common. 

As  a  specific  model, 

^xg{0)  x  +  exp[_£ 

where  D  —  1.7  ,  aXg  (>  0)  is  the  discrimination  parameter,  and  fiXg  is  the  location 
parameters,  has  been  used.  It  has  been  demonstrated  that  in  this  model: 

1.  additivity  of  the  operating  characteristics  (criterion  2)  practically  holds; 

2.  a  continuous  response  model  can  be  obtained  (criterion  3)  as  the  limiting  situation  in 
which  there  are  infinitely  many  subprocesses  in  each  step; 
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3.  the  unique  maximum  condition  (criterion  4)  is  satisfied; 

4.  orderliness  of  the  modal  points  of  the  operating  characteristics  (criterion  5)  practically 
holds,  except  for  unusual  cases  where  the  unidimensionality  of  the  steps  should  be 
questioned. 

In  contrast  to  the  acceleration  model,  the  partial  credit  model  (Masters,  1982)  and  the  gen¬ 
eralized  partial  credit  model  (Muraki,  1992)  do  not  have  additivity ,  and  thus  are  inappropriate 
as  models  for  typical  graded  response  situations.  These  models  are  versions  of  Bock’s  nominal 
response  model  (Bock,  1972),  which  is  based  on  the  individual  choice  behavior.  Although  they 
satisfy  the  unique  maximum  condition  and  the  modal  points  of  the  operating  characteristics 
are  ordered  in  accordance  with  the  graded  item  scores,  or  partial  credits,  lack  of  additivity,  and 
of  generalizability  to  continuous  models,  is  detrimental  as  graded  response  models. 

A  strength  of  the  acceleration  model  lies  in  the  fact  that,  even  if  a  researcher  has  started 
with  an  inappropriate  model,  it  will  be  easy  to  switch  to  the  acceleration  model  (Samejima, 
1994).  Figures  4-1  and  4-2  present  the  operating  characteristics  of  6  graded  responses  following 
the  partial  credit  model  and  those  following  the  acceleration  model,  respectively.  It  is  obvious 
that  these  two  sets  of  curves  are  practically  indistinguishable.  These  two  sets  of  curves  also 
demonstrate  that  success  in  curve  fitting  is  not  sufficient  in  validating  the  model. 

For  further  details  of  the  acceleration  model,  the  reader  is  directed  to  Samejima,  1994. 
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THETA 


FIGURE  4-1 

Six  operating  characteristics  of  graded  item  scores  following  the  partial  credit  model. 


THETA 

FIGURE  4-2 


Six  operating  characteristics  of  graded  item  scores  following  the  acceleration  model.  The 
parameters  were  adjusted  so  that  the  resulting  operating  characteristics  be  close  to  those  in 

Figure  4-1. 


33 


V.  Further  Research  and  Integration  of  Research  Findings 

In  this  research  period,  some  other  topics  that  were  worked  on  during  the  ONR  funding 
years  were  further  investigated  and  eventually  published,  or  in  press,  in  refereed  journals,  and 
also  some  of  the  research  findings  obtained  during  those  years  were  integrated  and  published, 
or  in  press,  as  book  chapters  and  a  proceeding  chapter.  These  topics  are  discussed  below. 

[V.l]  Further  Research 
[V.1.1]  MLE  BIAS  FUNCTION 

Following  the  bias  function  of  the  maximum  likelihood  estimate  in  the  three-parameter 
logistic  model  proposed  by  Lord  (1983),  the  principal  investigator  expanded  it  for  any  discrete 
responses  (Samejima,  1987),  and  called  it  the  MLE  bias  function.  The  research  was  continued 
and  eventually  written  in  two  articles  shown  as  [4]  and  [5]  of  the  refereed  journal  papers  in 
Section  1  (pages  1  and  2),  dividing  the  contents  into  the  general  case  of  discrete  responses  and 
a  specific  case  of  dichotomous  responses. 

[V.l. 2]  CRITICAL  OBSERVATIONS  OF  THE  TEST  INFORMATION  FUNCTION 
AS  A  MEASURE  OF  LOCAL  ACCURACY 

The  principal  investigator  proposed  the  constant  information  model  (Samejima,  1979a), 
and  using  this  model  observations  were  made  with  respect  to  the  speed  of  convergence  of  the 
conditional  distribution  of  the  maximum  likelihood  estimate  of  ability,  given  its  true  value,  as 
the  number  of  items  increases,  to  the  asymptotic  normality  (Samejima,  1979b).  Based  on  the 
results  which  indicated  that  there  were  substantial  differences  in  the  speed  of  convergence  to 
the  asymptotic  normality  depending  on  the  fixed  levels  of  ability,  critical  observations  of  the 
test  information  function  as  a  measure  of  local  accuracy  in  ability  estimation  were  written  in 
an  article  shown  as  [6]  of  the  refereed  journal  papers  in  Section  1  (page  2). 

[V.1.3]  PLAUSIBILITY  FUNCTIONS  OF  DISTRACTORS 

Using  the  Simple  Sum  Procedure  of  the  Conditional  P.D.F.  Approach  (Samejima,  1981, 
1988,  1990c),  the  operating  characteristics  of  the  distractors,  called  plausibility  functions ,  of 
the  multiple-choice  test  items  of  the  Level  11  Vocabulary  Subtest  of  the  Iowa  Tests  of  Basic 
Skill  were  estimated  (Samejima,  1984).  The  results  showed  differential  information  from  the 
separate  distractors  for  most  items,  which  can  be  used  in  ability  estimation  so  that  accuracies 
in  estimation  will  be  increased.  These  results  were  summarized  and  written  in  a  paper  under 
the  title  shown  as  [8]  of  the  refereed  journal  papers  in  Section  1  (page  2). 

[V.1.4]  ESTIMATION  OF  RELIABILITY  COEFFICIENTS  USING  THE  TEST 
INFORMATION  FUNCTION  AND  ITS  MODIFICATIONS 
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While  classical  mental  test  theory  is  population-bound,  latent  trait  models  are  population- 
free.  The  fact  that  the  reliability  coefficient  of  a  test  in  classical  mental  test  theory  is  a  property 
of  the  population  of  individuals  as  well  as  of  the  test  itself,  it  is  still  accepted  as  a  magic  number 
that  solely  belongs  to  the  test.  For  different  ability  distributions,  the  reliability  coefficients 
can  be  predicted  (Samejima,  1990b),  which  clearly  differ  for  different  ability  distributions. 
Predictions  were  made  using  the  test  information  functions  and  also  its  two  modifications 
(Samejima,  1990a),  and  the  results  were  compared.  These  findings  were  written  in  a  paper 
under  the  title  shown  as  [9]  of  the  refereed  journal  papers  in  Section  1  (page  2). 

[V.2]  Integration  of  Research  Findings 

[V.2.1]  ROLES  OF  FISHER  TYPE  INFORMATION  IN  LATENT  TRAIT  MODELS 

The  roles  of  Fisher  type  information  are  important  in  latent  trait  models.  They  were  inte¬ 
grated  in  a  book  chapter  under  the  title  shown  as  [1]  of  Section  1  (page  1).  The  topics  include 
weakly  parallel  tests  (Samejima,  1977),  the  test  information  function  and  its  two  modifications 
(Samejima,  1990a),  predictions  of  the  reliability  coefficient  and  the  standard  error  of  estimation 
(Samejima,  1990b),  equally  discriminating  ability  scale  (Samejima,  1981),  nonparametric  esti¬ 
mation  of  operating  characteristics  (Samejima,  1981,  1988,  1990c),  constancy  in  the  amount  of 
information  provided  by  a  single  dichotomous  item  (Samejima,  1979a),  constant  information 
model  (Samejima,  1979a,  1979b),  the  MLE  bias  function  (Samejima,  1987),  among  others. 

[V.2.2]  HUMAN  PSYCHOLOGICAL  BEHAVIOR 

Human  psychological  behavior  viewed  from  latent  trait  models  was  summarized  and  intro¬ 
duced  to  electronical  engineering  researchers  working  on  neuro  fuzzy  control,  and  written  as  a 
proceedings  chapter  under  the  title  shown  as  [11]  of  Section  1  (page  2).  Among  others,  the  pa¬ 
per  includes  the  comprehensive  methodologies  for  cognitive  diagnosis  using  latent  trait  models, 
which  were  developed  by  the  principal  investigator. 

[V.2.3]  GRADED  RESPONSE  MODEL 

The  general  theoretical  framework  of  the  graded  response  model  and  specific  models  such  as 
the  normal  ogive  and  logistic  model  (Samejima,  1969,  1972),  the  partial  credit  model  (Masters, 
1982),  the  generalized  partial  credit  model  (Muraki,  1992),  the  acceleration  model  (Samejima, 
1994),  etc.,  were  introduced  and  discussed  in  a  book  chapter  under  the  title  shown  as  [3]  of 
Section  1  (page  1). 
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VI.  Discussion 


The  author  was  too  busy  conducting  research  and  writing  research  reports  in  her  previous 
contract  periods  with  the  Office  of  Naval  Research  (N00014-77-C-0360,  N00014-81-C-0569, 
N00014-87-K-0320)  to  write  the  research  findings  in  the  forms  of  refereed  journal  papers.  It 
was  her  pleasure  that  in  the  second  half  of  the  present  contract  period  she  could  write  papers 
on  her  research  outcomes  obtained  in  the  past  years  for  refereed  journals,  book  chapters,  etc. 
Some  of  them  were  already  published,  and  others  are  in  press  or  accepted,  as  was  described  in 
Section  1. 

Many  other  topics  are  still  left  unpublished  in  refereed  journals,  however,  although  they  were 
printed  in  ONR  research  reports  in  the  past  years.  They  include  the  two  topics  mentioned  in 
Section  1,  that  is,  modified  test  information  functions  and  the  model  for  partly  continuous  and 
partly  discrete  responses.  In  addition  to  them,  there  are  validity  measures  in  latent  trait  models, 
a  latent  trait  model  for  differential  strategies,  a  family  of  models  for  multiple-choice  test  items, 
various  outcomes  from  using  the  nonparametric  approach  to  the  estimation  of  the  operating 
characteristic,  computerized  adaptive  testing,  and  the  practical  usefulness  of  the  method  of 
moments  for  fitting  polynomials  collaborated  with  one  of  my  former  research  assistants,  Mr. 
Philip  Livingston.  Research  will  be  supplemented  on  these  topics,  and  they  will  eventually  be 
published  in  refereed  journals. 

Many  of  these  papers,  published  or  unpublished,  includes  theories  and  methodologies,  which 
will  find  their  roles  in  cognitive  diagnoisis  and  assessment  using  controled,  computerized  ex¬ 
periments  with  constructed  responses,  as  was  described  in  the  previous  sections  of  this  final 
research  report.  Thus  theory,  methodologies  and  technologies  necessary  for  cognitive  assess¬ 
ment  are  ready  for  practical  applications,  provided  that  a  sizable  research  fund  is  available  to 
make  the  best  use  of  advanced  computer  technologies. 

It  is  the  author’s  hope  that  the  outcomes  of  the  present  research  period,  and  of  the  previous 
ones  started  in  1977,  will  be  used  in  the  future,  especially  in  cognitive  diagnosis  and  assessment. 
The  author  believes  that  they  will  contribute  to  the  advancement  of  psychology  in  depth  as 
well  as  in  perspectives. 
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